A Stochastic Finite-State Morphological Parser for Turkish
نویسندگان
چکیده
This paper presents the first stochastic finite-state morphological parser for Turkish. The non-probabilistic parser is a standard finite-state transducer implementation of two-level morphology formalism. A disambiguated text corpus of 200 million words is used to stochastize the morphotactics transducer, then it is composed with the morphophonemics transducer to get a stochastic morphological parser. We present two applications to evaluate the effectiveness of the stochastic parser; spelling correction and morphology-based language modeling for speech recognition.
منابع مشابه
Resources for Turkish morphological processing
We present a set of language resources and tools—a morphological parser, a morphological disambiguator, and a text corpus—for exploiting Turkish morphology in natural language processing applications. The morphological parser is a state-of-the-art finite-state transducer-based implementation of Turkish morphology. The disambiguator is based on the averaged perceptron algorithm and has the best ...
متن کاملTurkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus
In this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and compilation of a web corpus. Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. We present ...
متن کاملA Modular Approach to Turkish Noun Compounding: The Integration of a Finite-State Model
In this paper, we describe the design and integration of a three level cascaded non-deterministic finite state model of Turkish compounding into Turkish PAPPI, a comprehensive syntactic parser in the principles-andparameters(P&P) framework. Our approach is to handle compounding as an intermediate stage between morphological analysis and syntactic parsing. We discuss how the compounding machine ...
متن کاملA Character Recognizer for Turkish Language
This paper presents particularly a contextual post processing subsystem for a Turkish machine printed character recognition system. The contextual post processing subsystem is based on positional binary 3gram statistics for Turkish language, an error corrector parser and a lexicon, which contains root words and the inflected forms of the root words. Error corrector parser is used for correcting...
متن کاملTagging and Morphological Disambiguation of Turkish Text
Automat ic text tagging is an important component in higher level analysis of text corpora, and its output can be used in many natural language processing applications. In languages like Turkish or Finnish, with agglutinative morphology, morphological disambiguation is a very crucial process in tagging, as the structures of many lexical forms are morphologically ambiguous. This paper describes ...
متن کامل